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Mathematical population genetics is only one of Kingman's many re- 
•/^ , search interests. Nevertheless, his contribution to this field has been cru- 

cial, and moved it in several important new directions. Here we outline 
some aspects of his work which have had a major influence on population 
Ph , genetics theory. 
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1 Introduction 

In the early years of the previous century, the main aim of population 
genetics theory was to validate the Darwinian theory of evolution, us- 
ing the Mendelian hereditary mechanism as the vehicle for determining 
■«^ ■ how the characteristics of any daughter generation depended on the 

jy-T \ corresponding characteristics of the parental generation. By the 1960s, 

C^ ' however, that aim had been achieved, and the theory largely moved in 

^3 . a new, retrospective and statistical, direction. 

This happened because, at that time, data on the genetic constitu- 
tion of a population, or at least on a sample of individuals from that 
K> , population, started to become available. What could be inferred about 

Vh ' the past history of the population leading to these data? Retrospective 
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questions of this type include: "How do we estimate the time at which 
mitochondrial Eve, the woman whose mitochondrial DNA is the most 
recent ancestor of the mitochondrial DNA currently carried in the hu- 
man population, lived? How can contemporary genetic data be used to 
track the 'Out of Africa' migration? How do we detect signatures of past 
selective events in our contemporary genomes?" Kingman's famous co- 
alescent theory became a central vehicle for addressing questions such 
as these. The very success of coalescent theory has, however, tended to 
obscure Kingman's other contributions to population genetics theory. In 
this note wc review his various contributions to that theory, showing how 
coalescent theory arose, perhaps naturally, from his earlier contributions. 



2 Background 

Kingman attended lectures in genetics at Cambridge in about 1960, 
and his earliest contributions to population genetics date from 1961. It 
was well known at that time that in a randomly mating population for 
which the fitness of any individual depended on his genetic make-up at 
a single gene locus, the mean fitness of the population increased from 
one generation to the next, or at least remained constant, if only two 
possible alleles, or gene types, often labelled Ai and A2, were possible at 
that gene locus. However, it was well known that more than two alleles 
could arise at some loci (witness the ABO blood group system, admitting 
three possible alleles. A, B and O). Showing that in this case the mean 
population fitness is non-decreasing in time under random mating is far 
less easy to prove. This was conjectured by Mandel and Hughes (1958) 
and proved in the 'symmetric' case by Scheuer and Mandel (1959) and 
Mulholland and Smith (1959), and more generally by Atkinson et al. 
(1960) and (very generally) Kingman, (1961a, b). Despite this success, 
Kingman then focused his research in areas quite different from genetics 
for the next fifteen years. The aim of this paper is to document some 
of his work following his re-emergence into the genetics field, dating 
from 1976. Both of us were honoured to be associated with him in this 
work. Neither of us can remember the precise details, but the three-way 
interaction between the UK, the USA and Australia, carried out mainly 
by the now out-of-date flimsy blue aerogrammes, must have started in 
1976, and continued during the time of Kingman's intense involvement 
in population genetics. This note is a personal account, focusing on this 
interaction: many others were working in the held at the same time. 
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One of Kingman's research activities during the period 1961-1976 
leads to our first 'background' theme. In 1974 he estabhshed (Kingman, 
1975) a surprising and beautiful result, found in the context of storage 
strategies. It is well known that the symmetric /C-dimcnsional Dirichlet 
distribution 

{xxX2 ■■ -xkT"^ dxidx2 ... dxK-i, (2.1) 

1 (a)^ 

where Xi > 0, ^ Xj = 1, does not have a non-trivial limit as K -^ oo, for 
given fixed a. Despite this, if we let ii" — >■ oo and a — > in such a way that 
the product Ka remains fixed at a constant value 9, then the distribution 
of the order statistics x^y^ > X(^2) ^ ^(3) > • • • converges to a non- 
degenerate limit. (The parameter 9 will turn out to have an important 
genetical interpretation, as discussed below.) Kingman called this the 
Poisson-Dirichlet distribution, but we suggest that its true author be 
honoured and that it be called the 'Kingman distribution'. We refer to 
it by this name in this paper. So important has the distribution become 
in mathematics generally that a book has been written devoted entirely 
to it (Feng, 2010). This distribution has a rather complex form, and 
aspects of this form are given below. 

The Kingman distribution appears, at first sight, to have nothing to 
do with population genetics theory. However, as we show below, it turns 
out, serendipitously, to be central to that theory. To see why this is so, 
we turn to our second 'background' theme, namely the development of 
population theory in the 1960s and 1970s. 

The nature of the gene was discovered by Watson and Crick in 1953. 
For our purposes the most important of their results is the fact that 
a gene is in effect a DNA sequence of, typically, some 5000 bases, each 
base being one of four types, A, G, C or T. Thus the number of types, or 
alleles, of a gene consisting of 5000 bases is 4^'°™. Given this number, we 
may for many practical purposes suppose that there are infinitely many 
different alleles possible at any gene locus. However, gene sequencing 
methods took some time to develop, and little genetic information at the 
fundamental DNA level was available for several decades after Watson 
and Crick. 

The first attempt at assessing the degree of genetic variation from one 
person to another in a population at a less fundamental level depended 
on the technique of gel electrophoresis, developed in the 1960s. In loose 
terms, this method measures the electric charge on a gene, with the 
charge levels usually thought of as taking integer values only. Genes 
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having different electric charges are of different allehc types, but it can 
well happen that genes of different allelic types have the same electric 
charge. Thus there is no one-to-one relation between charge level and 
allelic type. A simple mutation model assumes that a mutant gene has 
a charge differing from that of its parent gene by either ±1. We return 
to this model in a moment. 

In 1974 Kingman travelled to Australia, and while there met Pat 
Moran (as it happens, the PhD supervisor of both authors of this pa- 
per), who was working at that time on this 'charge-state' model. The 
two of them discussed the properties of a stochastic model involving a 
population of N individuals, and hence 2N genes at any given locus. 
The population is assumed to evolve by random sampling: any daugh- 
ter generation of genes is found by sampling, with replacement, from 
the genes from the parent generation. (This is the well-known 'Wright- 
Fisher' model of population genetics, introduced into the population 
genetics literature independently by Wright (1931) and Fisher (1922).) 
Further, each daughter generation gene is assumed to inherit the same 
charge as that of its parent with probability 1 — u, and with probability 
u is a charge-changing mutant, the change in charge being equally likely 
to be -|-1 and —1. 

At first sight it might seem that, as time progresses, the charge levels 
on the genes in future generations become dispersed over the entire array 
of positive and negative integers. But this is not so. Kingman recognized 
that there is a coherency to the locations of the charges on the genes 
brought about by common ancestry and the genealogy of the genes in 
any generation. In Kingman's words (Kingman 1976), amended here to 
our terminology, "The probability that [two genes in generation t\ have a 
common ancestor gene [in generation s, for s < i,] is 1 — (1 — (27V)~^)*~*, 
which is near unity when {t — s) is large compared to 2A^. Thus the [loc- 
ations of the charges in any generation] form a coherent group, . . . , 
and the relative distances between the [charges] remain stochastically 
bounded". We do not dwell here on the elegant theory that Kingman 
developed for this model, and note only that in the above quotation 
we see here the beginnings of the idea of looking backward in time to 
discuss properties of genetic variation observed in a contemporary gener- 
ation. This viewpoint is central to Kingman's concept of the coalescent, 
discussed in detail below. 

Parenthetically, the question of the mean number of 'alleles', or oc- 
cupied charge states, in a population of size N {2N genes) is of some 
mathematical interest. This depends on the mutation rate u and the 
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population size N. It was originally conjectured by Kimura and Ohta 
(1978) that this mean remains bounded as iV — > oo. However, Kesten 
(1980a, b) showed that it increases indefinitely as A^ — > oo, but at an ex- 
traordinarily slow rate. More exactly, he found the following astounding 
result. Define 70 — 1, ■Jk+i — e'^'^ , k — 1, 2, 3, . . . , and X{2N) as the 
largest k such that 7^ < 2N. Suppose that ANu = 0.2. Then the ran- 
dom number of 'alleles' in the population divided by X{2N) converges 
in probability to a constant whose value is approximately 2 as A^ — ?> 00. 
Some idea of the slowness of the divergence of the mean number of alleles 
can be found by observing that if 2N = ioi656520^ ^^^^ X{2N) = 3. 

In a later paper (Kingman 1977a), Kingman extended the theory to 
the multi-dimensional case, where it is assumed that data are available 
on a vector of measurements on each gene. Much of the theory for the 
one-dimensional charge-state model carries through more or less im- 
mediately to the multi-dimensional case. As the number of dimensions 
increases, some of this theory established by Kingman bears on the 'in- 
finitely many alleles' model discussed in the next paragraph, although as 
Kingman himself noted, the geometrical structure inherent in the model 
implies that a convergence of his results to those of the infinitely-many- 
alleles model does not occur, since the latter model has no geometrical 
structure. 

The infinitely-many-alleles model, introduced in the 1960s, forms the 
second background development that we discuss. This model has two 
components. The first is a purely demographic, or genealogical, model 
of the population. There are many such models, and here we consider 
only the Wright-Fisher model referred to above. (In the contemporary 
literature many other such models are discussed in the context of the 
infinitely-many-alleles model, particularly those of Moran (1958) and 
Cannings (1974), discussed in Section^) The second component refers to 
the mutation assumption, superimposed on this model. In the infinitely- 
many-alleles model this assumption is that any new mutant gene is of 
an allelic type never before seen in the population. (This is motivated 
by the very large number of alleles possible at any gene locus, referred 
to above.) The model also assumes that the probability that any gene 
is a mutant is some fixed value u, independent of the allelic type of the 
parent and of the type of the mutant gene. 

From a practical point of view, the model assumes a technology (rel- 
evant to the 1960s) which is able to assess whether any two genes are of 
the same or are of different allelic types (unlike the charge-state model, 
which does not fully possess this capability), but which is not able to 
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distinguish any further between two genes (as would be possible, for ex- 
ample, if the DNA sequences of the two genes were known). Further, 
since an entire generation of genes is never observed in practice, atten- 
tion focuses on the allelic configuration of the genes in a sample of size 
n, where n is assumed to be small compared to 2N ^ the number of genes 
in the entire population. 

Given the nature of the mechanism assumed in this model for dis- 
tinguishing the allelic types of the n genes in the sample, the data in 
effect consist of a partition of the integer n described by the vector 
(ai, 02, . . . , a„), where a^ is the number of allelic types observed in the 
sample exactly i times each. It is necessary that ^ iai — n, and it turns 
out that under this condition, and to a close approximation, the station- 
ary probability of observing this vector is 

lai2a2...na„Q^!a2!...a„!S'„(6')' ^ ■ ^ 

where 9 is defined as ANu and Sn{9) = 6(9 + 1){9 + 2) ■ ■ ■ {9 + n - 1), 
(Ewens (1972), Karlin and McGregor (1972)). 

The marginal distribution of the number K = '^ai of distinct alleles 
in the sample is found from (12.21) as 

Prob(i^ = k) = \S^^\9''/Sn{9), (2.3) 

where S'^ is a Stirling number of the first kind. It follows from (12.21) 
and (|2.3I) that K is a sufficient statistic for 9, so that the conditional 
distribution of (ai, 02, ... , a„) given K is independent of 9. 

The relevance of this observation is as follows. As noted above, the 
extent of genetic variation in a population was, by electrophoresis and 
other methods, beginning to be understood in the 1960s. As a result of 
this knowledge, and for reasons not discussed here, Kimura advanced 
(Kimura 1968) the so-called 'neutral theory', in which it was claimed 
that much of the genetic variation observed did not have a selective 
basis. Rather, it was claimed that it was the result of purely random 
changes in allelic frequency inherent in the random sampling evolution- 
ary model outlined above. This (neutral) theory then becomes the null 
hypothesis in a statistical testing procedure, with some selective mech- 
anism being the alternative hypothesis. Thus the expression in (j2.2p is 
the null hypothesis allelic-partition distribution of the alleles in a sample 
of size n. The fact that the conditional distribution of (01,02, . . . ,a„) 
given K is independent of 9 implies that an objective testing procedure 
for the neutral theory can be found free of unknown parameters. 
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Both authors of this paper worked on aspects of this statistical testing 
theory during the period 1972-1978, and further reference to this is made 
below. The random sampling evolutionary scheme described above is no 
doubt a simplification of real evolutionary processes, so in order for the 
testing theory to be applicable to more general evolutionary models it 
is natural to ask: "To what extent does the expression in (|2.2p apply 
for evolutionary models other than that described above?" One of us 
(GAW) worked on this question in the mid-1970s (Watterson, 1974a, 
1974b). This question is also discussed below. 



3 Putting it together 

One of us (GAW) read Kingman's 1975 paper soon after it appeared 
and recognized its potential application to population genetics theory. 
In the 1970s the joint density function (|2.1[) was well known to arise in 
that theory when some fixed finite number K of alleles is possible at the 
gene locus of interest, with symmetric mutation between these alleles. In 
population genetics theory one considers, as mentioned above, infinitely 
many possible alleles at any gene locus, so that the relevance of King- 
man's limiting (K — >■ oo) procedure to the infinitely many alleles model, 
that is the relevance of the Kingman distribution, became immediately 
apparent. 

This observation led (Watterson 1976) to a derivation of an explicit 
form for the joint density function of the first r order statistics a:(i), 3^(2), 
. . . , X(^r) hi the Kingman distribution. (There is an obvious printer's error 
in equation (8) of Watterson's paper.) This joint density function was 
shown to be of the form 

f{x{i),X(^2),---,X{r)) = ^T(6')e'^*.g(y){a;(i)a;(2) • • • X(r)}"^a;^;^\ (3.1) 

where y = (1 — a;(i) — a;(2) — ••• — x(^r))/x(r), 7 is Euler's constant 
0.57721 . . ., and g{y) is best defined through the Laplace transform equa- 
tion (Watterson and Guess (1977)) 



e"^ff(2/)rfy = exp ( 6* / u-'{e-''' - 1) duj . (3.2) 

The expression p.ip simplifies to 

f{x(i),...,X(^r)) = ^''{^^(i) •••a;(^)}"^(l-a;(i) X{r)f~^ (3.3) 
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when X(i) + x^2) + • • • + 2:(r-i) + '^^(r) ^ I7 and in particular, 

/^i))=0(a:(i))-i(l-X(i))'^-^ (3.4) 

when i < xii) < 1. 

Population geneticists are interested in the probability of 'population 
mononiorphisni', defined in practice as the probability that the most 
frequent allele arises in the population with frequency in excess of 0.99. 
Equation p. 41) implies that this probability is close to 1 — (0.01)^. 

Kingman himself had placed some special emphasis on the largest of 
the order statistics, which in the genetics context is the allele frequency 
of the most frequent allele. This leads to interesting questions in genet- 
ics. For instance. Crow (1973) had asked: "What is the probability that 
the most frequent allele in a population at any time is also the oldest 
allele in the population at that time?" A nice application of reversib- 
ility arguments for suitable population models allowed Watterson and 
Guess (1977) to obtain a simple answer to this question. In models where 
all alleles are equally fit, the probability that any nominated allele will 
survive longest into the future is (by a simple symmetry argument) its 
current frequency. For time-reversible processes, this is also the probab- 
ility that it is the oldest allele in the population. Thus conditional on the 
current allelic frequencies, the probability that the most frequent allele 
is also the oldest is simply its frequency xny Thus the answer to Crow's 
question is simply the mean frequency of the most frequent allele. A for- 
mula for this mean frequency, as a function of the mutation parameter 
9, together with some numerical values, were given in Watterson and 
Guess (1977), and a partial listing is given in the first row of Table IXTl 
(We discuss the entries in the second row of this table in Section [T]) 

Table 3.1 Mean frequency of (a) the most frequent allele, (b) the 
oldest allele, in a population as a funetion of 9 . The probability that 
the most frequent allele is the oldest allele is also its mean frequency. 

e 0.1 0.2 0.5 1.0 2.0 5.0 10.0 20.0 

Most frequent 0.936 0.882 0.758 0.624 0.476 0.297 0.195 0.122 
Oldest 0.909 0.833 0.667 0.500 0.333 0.167 0.091 0.048 



As will be seen from the table, the mean frequency E{x(^i-)) of the most 
frequent allele decreases as 6 increases. Watterson and Guess (1977) 
provided the bounds (i)^ < i?(a;(i)) < 1 — 9{1 — 0)log2, which give 
an idea of the value of -E(x(i)) for small values of 6, and also showed 
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that £'(x(i)) decreases asymptotically like (log 9)/9, giving an idea of 
the value of i?(a:(i)) for large 9. 

From the point of view of testing the neutral theory of Kimura, Wat- 
terson (1977, 1978) subsequently used properties of these order statistics 
for testing the null hypothesis that there are no selective forces determ- 
ining observed allelic frequencies. He considered various alternatives, 
particularly heterozygote advantage or the presence of some deleterious 
alleles. For instance, in (Watterson 1977) he investigated the situation 
when all heterozygotes had a slight selective advantage over all homo- 
zygotes. The population truncated homozygosity ^[ xf figures prom- 
inently in the allelic distribution corresponding to p.ip and was thus 
studied as a test statistic for the null hypothesis of no selective advant- 
age. Similarly, when only a random sample of n genes is taken from the 
population, the sample homozygosity can be used as a test statistic of 
neutrality. 

Here we make a digression to discuss two of the values in the first row 
of Table [Ql It is well known that in the case 9 = 1, the allelic partition 
formula (|2.2|) describes the probabilistic structure of the lengths of the 
cycles in a random permutation of the numbers {1,2, ... ,n}. Each cycle 
corresponds to an allelic type and in the notation aj thus indicates the 
number of cycles of length j. Various limiting (n — )> cxi) properties of 
random permutations have long been of interest (see for example Finch 
(2003)). Finch (page 284) gives the limiting mean of the normalized 
length of the longest cycle as 0.624 ... in such a random permutation, 
and this agrees with the value listed in Table [01 for the case 9 = 1. 
(Finch also in effect gives the standard deviation of this normalized 
length as 0.1921 . . ..) Next, p.4p shows that the limiting probability 
that the (normalized) length of the longest cycle exceeds ^ is log 2. This 
is the limiting value of the exact probability for a random permutation 
of the numbers {1, 2, . . . , n}, which from (|2.2p isl — -i + i — •••±i. 

Finch also considers aspects of a random mapping of {1, 2, . . . , n} to 
{1, 2, . . . , n}. Any such a mapping forms a random number of 'compon- 
ents', each component consisting of a cycle with a number (possibly zero) 
of branches attached to it. Aldous (1985) provides a full description of 
these, with diagrams which help in understanding them. Finch takes up 
the question of finding properties of the normalized size of the largest 
component of such a random mapping, giving (page 289) a limiting mean 
of 0.758 .. . for this. This agrees with the value in Table IXTI for the case 
9 = 0.5. This is no coincidence: Aldous (1985) shows that in a limiting 
sense (j2.2p provides the limiting distribution of the number and (unnor- 
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malized) sizes of the components of this mapping, with now aj indicating 
the number of components of size j. As a further result, p.4p shows that 
the Umiting probability that the (normalized) size of the largest com- 
ponent of a random mapping exceeds ^ is log(l + \/2) w 0.881374. 

Arratia et al. (2003) show that (|2.2p provides, for various values of 
9, the partition structure of a variety of other combinatorial objects for 
finite n, and presumably the Kingman distribution describes appropriate 
limiting {n -^ oo) results. Thus the genetics-based equation (|2.2p and 
the Kingman distribution provide a unifying theme for these objects. 

The allelic partition formula (12. 2p was originally derived without ref- 
erence to the X-allele model (J2.1D . but was also found (Watterson, 1976) 
from that model as follows. We start with a population whose allele fre- 
quencies are given by the Dirichlet distribution (J2.1D . If a random sample 
of n genes is taken from such a population, then given the population's 
allele frequencies, the sample allele frequencies have a multinomial dis- 
tribution. Averaging this distribution over the population distribution 
(j2.ip . and then introducing the alternative order-statistic sample de- 
scription (ai, 02, ... , a„) as above, the limiting distribution is the parti- 
tion formula (|2.2p . found by letting K ^ oo and a — ?> in (|2.ip in such 
a way that the product Ka remains fixed at a constant value 9. 



4 Robustness 

As stated above, the expression (j2.2p was first found by assuming a ran- 
dom sampling evolutionary model. As also noted, it can also be arrived 
at by assuming that a random sample of genes has been taken from an in- 
finite population whose allele frequencies have the Dirichlet distribution 
(|2.1I) . It applies, however, to further models. Moran (1958) introduced 
a 'birth-and-death' model in which, at each unit time point, a gene is 
chosen at random from the population to die. Another gene is chosen at 
random to reproduce. The new gene either inherits the allelic type of its 
parent (probability 1 — u), or is of a new allelic type, not so far seen in 
the population, with probability u. Trajstman (1974) showed that (12.21) 
applies as the stationary allelic partition distribution exactly for Moran's 
model, but with n replaced by the finite population number of genes 2A^ 
and with 9 defined as 2Nu/{l — u). More than this, if a random sample 
of size n is taken without replacement from the Moran model population, 
it too has an exact description as in (j2.2p . This result is a consequence 
of Kingman's (1978b) study of the consistency of the allelic properties 
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of sub-samples of samples. (In practice, of course, the difference between 
sampling with, or without, replacement is of little consequence for small 
samples from large populations.) Kingman (1977a, 1977b) followed up 
this result by showing that random sampling from various other popu- 
lation models, including significant cases of the Cannings (1974) model, 
could also be approximated by (j2.2p . This was important because sev- 
eral consequences of (12. 2p could then be applied more generally than 
was first thought, especially for the purposes of testing of the neutral 
alleles postulate. He also used the concept of 'non-interference' (see the 
concluding comments in Section |5]) as a further reason for the robustness 
of (E3). 



5 A convergence result 

It was noted in Section [3] that Wattcrson (1976) was able to arrive at 
both the Kingman distribution and the allelic partition formula (|2.2p 
from the same starting point (the 'iiT-allele' model). This makes it clear 
that there must be a close connection between the two, and in this 
section we outline Kingman's work (Kingman 1977b) which made this 
explicit. Kingman imagined a sequence of populations in which the size 
of population i, [i = 1, 2, ...) tends to infinity as i -> oo. For any 
fixed i and any fixed sample size n of genes taken from the popula- 
tion, there will be some probability of the partition {ai, a2, . . . , a„}, 
where Oj has the definition given in Section [51 Kingman then stated 
that this sequence of populations would have the Ewens sampling prop- 
erty if, for each fixed n, this corresponding sequence of probabilities of 
{ai, a2, . . . , a„} approached that given in (|2.2p as i — )■ oo. In a parallel 
fashion, for each fixed i there will also be a probability distribution for 
the order statistics (pi,p2, ■ ■ ■), where pj denotes the frequency of the 
jth most frequent allele in the population. Kingman then stated that 
this sequence would have the Poisson-Dirichlet limit if this sequence of 
probabilities approached that given by the Poisson-Dirichlet distribu- 
tion. (We would replace 'Poisson-Dirichlet' in this sentence by 'King- 
man'.) He then showed that this sequence of populations has the Ewens 
sampling property if and only if it has the Poisson-Dirichlet (Kingman 
distribution) limit. 

The proof is quite technical and we do not discuss it here. We have 
noted that the Kingman distribution may be thought of as the distribu- 
tion of the (ordered) allelic frequencies in an infinitely large population 
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evolving as the random sampling infinitely-many-allele process, so this 
result provides a beautiful (and useful) relation between population and 
sample properties of such a population. 



6 Partition structures 

By 1977 Kingman was in full flight in his investigation of various genetics 
problems. One line of his work started with the probability distribution 
(|2.2p . and his initially innocent-seeming observation that the size n of 
the sample of genes bears further consideration. The size of a sample is 
generally taken in Statistics as being comparatively uninteresting, but 
Kingman (1978b) noted that a sample of n genes could be regarded as 
having arisen from a sample of n + 1 genes, one of which was accidently 
lost, and that this observation induces a consistency property on the 
probability of any partition of the number n. Specifically, he observed 
that if we write P„(ai, 02, . . .) for the probability of the sample partition 
in a sample of size n, we require 

P„(ai,a2,...) = — -P„+i(ai + l,a2,...) + 

n + i 

Y. "'.i P"+i(Qi^ ■ • • . «.-i - 1' «J + 1' • ■ •)• (6.1) 

Fortunately, the distribution (12.21) does satisfy this equation. But King- 
man went on to ask a deeper question: "What are the most general 
distributions that satisfy equation (|6.ip ?" These distributions he called 
'partition structures'. He showed that all such distributions that are of 
interest in genetics could be represented in the form 

Pn(ai,a2,...) = / P„(ai,a2,... |x)/i((ix) (6.2) 

where /i is some probability measure over the space of infinite sequences 
(xi, a;2, X3 . . .) satisfying xi > 0:2 > X3 • • • , X^^i ^n = 1- 

An intuitive understanding of this equation is the following. One way 
to obtain a consistent set of distributions satisfying (j6.ip is to imagine 
a hypothetically infinite population of types, with a proportion xi of 
the most frequent type, a proportion X2 of the second most frequent 
type, and so on, forming a vector x. For a fixed value of n, one could 
then imagine taking a sample of size n from this population, and write 
P„ (ai , 02 , ... I x) for the (effectively multinomial) probability that the 
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configuration of the sample is (ai, 02, . . .). It is clear that the resulting 
sampling probabilities will automatically satisfy the consistency prop- 
erty in (|6.1I) . More generally one could imagine the composition of the 
infinite population itself being random, so that first one chooses its com- 
position X from /i, and then conditional on x one takes a sample of size 
n with probability P„(ai, 02, . . . |x). The right-hand side in (|6.2p is then 
the probability of obtaining the sample configuration [ai,a2, ■ ■ ■) av- 
eraged over the composition of the population. Kingman's remarkable 
result was that all partition structures arising in genetics must have the 
form (j6.2p . for some /i. Kingman called partition structures that could be 
expressed as in (j6.2p 'representable partition structures' and /i the 'rep- 
resenting measure', and later (Kingman 1978c) found a representation 
generalizing (j6.2p applying for any partition structure. 

The similarity between (j6.2p and the celebrated de Finetti representa- 
tion theorem for exchangeable sequences might be noted. This has been 
explored by Aldous (1985) and Kingman (1978a), but we do not pursue 
the details of this here. 

In the genetics context, the results of Scction|4]show that samples from 
Moran's infinitely many neutral alleles model, as well as the population 
as a whole, have the partition structure property. So do samples of genes 
from other genetical models. This makes it natural to ask: "What is the 
representing measure /i for the allelic partition distribution (j2.2p ?" And 
here we come full circle, since he showed that the required representing 
measure is the Kingman distribution, found by him in (Kingman, 1975) 
in quite a different context! 

The relation between the Kingman distribution and the sampling dis- 
tribution (12. 2p is of course connected to the convergence results dis- 
cussed in the previous section. From the point of view of the geneticist, 
the Kingman distribution is then regarded as applying for an infinitely 
large population, evolving essentially via the random sampling process 
that led to (|2.2p . This was made precise by Kingman in (1978b), and 
it makes it unfortunate that the Kingman distribution does not have 
a 'nice' mathematical form. However, we see in Section [7] that a very 
pretty analogue of the Kingman distribution exists when we label alleles 
not by their frequencies but by their ages in the population. This in 
turn leads to the capstone of Kingman's work in genetics, namely the 
coalescent process. 

Before discussing these matters we mention another property enjoyed 
by the distribution (12. 2p that Kingman investigated, namely that of non- 
interference. Suppose that we take a gene at random from the sample 



14 Warren J. Ewens and Geoffrey A. Waiter son 

of n genes, and find that there are in all r genes of the allelic type of 
this gene in the sample. These r genes are now removed, leaving n — r 
genes. The non-interference requirement is that the probability structure 
of these n — r genes should be the same as that of an original sample 
of n — r genes, simply replacing n wherever found by n — r. Kingman 
showed that of all partition structures of interest in genetics, the only one 
also satisfying this non-interference requirement is (12.21) . This explains 
in part the robustness properties of (|2.2I) to various evolutionary genetic 
models. However, it also has a natural interpretation in terms of the 
coalescent process, to be discussed in Section |51 

We remark in conclusion that the partition structure concept has be- 
come influential not only in the genetics context, but in Bayesian stat- 
istics, mathematics and various areas of science, as the papers of Aldous 
(2009) and of Gnedin, Haulk and Pitman (2009) in this Festschrift show. 
That this should be so is easily understood when one considers the nat- 
ural logic of the ideas leading to it. 



7 'Age' properties and the GEM distribution 

We have noted above that the Kingman distribution is not user-friendly. 
This makes it all the more interesting that a size-biased distribution 
closely related to it, namely the GEM distribution, named for Griffiths 
(1980), Engen (1975) and McCloskey (1965), who established its sali- 
ent properties, is both simple and elegant, thus justifying the acronym 
'GEM'. More important, it has a central interpretation with respect to 
the ages of the alleles in a population. We now describe this distribution. 
We have shown that the ordered allelic frequencies in the population 
follow the Kingman distribution. Suppose that a gene is taken at random 
from the population. The probability that this gene will be of an allelic 
type whose frequency in the population is x is just x. This allelic type 
was thus sampled by this choice in a size-biased way. It can be shown 
from properties of the Kingman distribution that the probability density 
of the frequency of the allele determined by this randomly chosen gene 
is 

f{x)^9{l~xf-\ 0<a;<l. (7.1) 

This result was also established by Ewens (1972). 

Suppose now that all genes of the allelic type just chosen are removed 
from the population. A second gene is now drawn at random from the 
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population and its allelic type observed. The frequency of the allelic 
type of this gene among the genes remaining at this stage is also given 
by (|7.ip . All genes of this second allelic type are now also removed from 
the population. A third gene then drawn at random from the genes 
remaining, its allelic type observed, and all genes of this (third) allelic 
type removed from the population. This process is continued indefinitely. 
At any stage, the distribution of the frequency of the allelic type of any 
gene just drawn among the genes left when the draw takes place is given 
by (|7.ip . This leads to the following representation. Denote by Wj the 
population frequency of the jth allelic type drawn. Then we can write 

wi =xi, ..., Wj == (1 - a;i)(l - a;2)- •• (1 - Xj^i)x-j, (j = 2,3,...), 

(7.2) 
where the Xj are independent random variables, each having the distri- 
bution (|7.ip . The random vector {wi,W2, ■ ■ ■) then has the GEM distri- 
bution. 

All the alleles in the population at any time eventually leave the pop- 
ulation, through the joint processes of mutation and random drift, and 
any allele with current population frequency x survives the longest with 
probability x. That is, since the GEM distribution was found according 
to a size-biased process, it also arises when alleles are labelled according 
to the length of their future persistence in the population. Time reversib- 
ility arguments then show that the GEM distribution also applies when 
the alleles in the population are labelled by their age. In other words, 
the vector {wi,W2, ■ ■ ■) can be thought of as the vector of allelic frequen- 
cies when alleles are ordered with respect to their ages in the population 
(with allele 1 being the oldest). 

The Kingman coalescent, to be discussed in the following section, is 
concerned among other things with 'age' properties of the alleles in the 
population. We thus present some of these properties here as an intro- 
duction to the coalescent: a more complete list can be found in Ewens 
(2004) . The elegance of many age-ordered formulae derives directly from 
the simplicity and tractability of the GEM distribution. 

Given the focus on retrospective questions, it is natural to ask ques- 
tions about the oldest allele in the population. The GEM distribution 
shows that the mean population frequency of the oldest allele in the 
population is 

1 1 

x(l - x)^-i dx = -. (7.3) 

This implies that when 9 is very small, this mean frequency is approxim- 
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ately 1 — O.lt is interesting to compare this with the mean frequency of 
the most frequent allele when is small, found in effect from the King- 
man distribution to be approximately 1 — 9\og2. A more general set of 
comparisons of these two mean frequencies, for representative values of 
9, is given in Table [5TT1 

More generally, the mean population frequency of the jth oldest allele 
in the population is 

1 / 9 y"-i 
\ + 9\\ + 9) ' 

For the case 9 = 1, Finch (2003) gives the mean frequencies of the 
second and third most frequent alleles as 0.20958 . . . and 0.088316 . . . 
respectively, which may be compared to the mean frequencies of the 
second and third oldest alleles, namely 0.25 and 0.125. For 9 = 1/2 the 
mean frequency of the second most frequent allele is 0.170910 . . ., while 
the mean frequency of the second oldest allele is 0.22222. 

Next, the probability that a gene drawn at random from the popula- 
tion is of the type of the oldest allele is the mean frequency of the oldest 
allele, namely 1/(1 + 9), as just shown (see also Table IXTj) . More gener- 
ally the probability that n genes drawn at random from the population 
are all of the type of the oldest allele in the population is 

The GEM distribution has a number of interesting mathematical prop- 
erties, of which we mention here only one. It is a so-called 'residual alloc- 
ation' model (Halmos 1944). Halmos envisaged a king with one kilogram 
of gold dust, and an infinitely long line of beggars asking for gold. To 
the first beggar the king gives wi kilogram of gold, to the second W2 
kilogram of gold, and so on, as specified in (j7.2[) . where the Xj are in- 
dependently and identically distributed (i.i.d.) random variables, each 
having some probability distribution over the interval (0, 1). 

Different forms of this distribution lead to different properties of the 
distribution of the 'residual allocations' wi, z«2, w^, .... One such prop- 
erty is that the distribution of wi, W2, W3, ... be invariant under size- 
biased sampling. It can be shown that the GEM distribution is the only 
residual allocation model having this property. This fact had been ex- 
ploited by Hoppe (1986, 1987) to derive various results of interest in 
genetics and ecology. 

We now turn to sampling results. The probability that n genes drawn 
at random from the population are all of the same allelic type as the 
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oldest allele in the population is given in (|7.4p . The probability that n 
genes drawn at random from the population arc all of the same unspe- 
cified allelic type is 



Jo 



{n-iy. 



ii + e)i2 + e)---in + e-i) 



n. 



in agreement with (I2.2p for the case a^ = 0, j = 1, 2, . . . , n — 1, a- 
From this result and that in (j7.4p , given that n genes drawn at random 
are all of the same allelic type, the probability that they are all of the al- 
lelic type of the oldest allele is n/{n+9). The similarity of this expression 
with that deriving from a Bayesian calculation is of some interest. 

Perhaps the most important sample distribution concerns the frequen- 
cies of the alleles in the sample when ordered by age. This distribution 
was found by Donnelly and Tavare (1986), who showed that the prob- 
ability that the number of alleles in the sample takes the value k, and 
that the age-ordered numbers of these alleles in the sample are, in age 
order, n(i), n(2), ..., n(fc), is 

(^'i^-^y- , (7.5) 

Sn{0)n{k)i'ri(k) +»^(fe-i))-- • ("-(/c) +"(/c-i) H "(2))' 

where Sj {0} is defined below (|2.2p . This formula can be found in several 
ways, one being as the size-biased version of (|2.2p . 

These are many interesting results connecting the oldest allele in the 
sample to the oldest allele in the population. For example, Kelly (1976) 
showed that the probability that the oldest allele in the sample is rep- 
resented j times in the sample is 

-n{,){ , ) ■ ^ = 1-V..,~. (7.6) 

He also showed that the probability that the oldest allele in the pop- 
ulation is observed at all in the sample is n/{n + 9). The probability 
that a gene seen j times in the sample is of the oldest allelic type in the 
population is j /{n + 0). When j — n, so that there is only one allelic 
type present in the sample, this probability is n/{n + 9). Donnelly (1986) 
showed, more generally, that the probability that the oldest allele in the 
population is observed j times in the sample is 

This is of course closely connected to Kelly's result. For the case j — the 
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probability (|7.7p is 0/{n + d), confirming the complementary probability 
n/{n + 6) fomid above. Conditional on the event that the oldest allele in 
the population does appear in the sample, a straightforward calculation 
using (|7.7|) shows that this conditional probability and that in (I7.6|) are 
identical. 

It will be expected that various exact results hold for the Moran model, 
with 6 defined as 2Nu/{l — u). The first of these is an exact representa- 
tion of the GEM distribution, analogous to (|7.2p . This has been provided 
by Hoppe (1987). Denote by A^i, N2, . . . the numbers of genes of the 
oldest, second-oldest, . . . alleles in the population. Then Ni, N2, ... can 
be defined in turn by 

iV, = 1-|-M„ i=l,2,..., (7.8) 

where M, has a binomial distribution with index 2N — Ni — N2 — ■ ■ ■ — 
Ni^i — 1 and parameter Xi, where a;i, 0:2, ... are i.i.d. continuous random 
variables each having the density function (17. ip . Eventually TVi -I- iV2 -I- 
• • • -t- Nk — IN and the process stops, the final index k being identical 
to the number K2N of alleles in the population. 

It follows directly from this representation that the mean of A^i is 

\ + {2N-l)e f\il~xf-Ux = ^^^. 
Jo 1 + ^ 

If there is only one allele in the population, so that the population 
is strictly monomorphic, this allele must be the oldest one in the pop- 
ulation. The above representation shows that the probability that the 
oldest allele arises 2N times in the population is 

Prob (Ml =2N-l)=e f x^'^-\l - xf-^ dx, 

Jo 

and this reduces to the exact monomorphism probability 

2N - 1 



il + e){2 + 0)---{2N-l + 0) 

for the Moran model. 

More generally, Kelly (1977) has shown that the probability that the 
oldest allele in the population is represented by j genes is, exactly. 

The case j = 2N considered above is a particular example of (|7.9[) . and 
the mean number {2N + 9)/{l + 0) also follows from (j7.9p . 
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We now consider 'age' questions. It is found that the mean time, into 
the past, that the oldest allele in the population entered the population 
(by a mutation event) is 

27V . J. J 

Mean age of oldest allele = y^ — generations. (7.10) 

It can be shown (see Watterson and Guess (1977) and Kelly (1977)) 
that not only the mean age of the oldest allele, but indeed the entire 
probability distribution of its age, is independent of its current frequency 
and indeed of the frequency of all alleles in the population. 

If an allele is observed in the population with frequency p, its mean 
age is 

2JV 

J2 j{j+e-l) (^ " ^^ " ^^') generations. (7.11) 

This is a generalization of the expression in (J7.10I) , since if p = 1 only 
one allele exists in the population, and it must then be the oldest allele. 
Our final calculation concerns the mean age of the oldest allele in a 
sample of n genes. This is 

n ^ 

4^X! ■/ ■ , /I TT generations. (7-12) 



rtj(j+^-l) 



Except for small values of n, this is close to the mean age of the oldest 
allele in the population, given in (j7.10p . In other words, unless n is small, 
it is likely that the oldest allele in the population is represented in the 
sample. 

We have listed the various results given in this section not only because 
of their intrinsic interest, but because they form a natural lead-in to 
Kingman's celebrated coalescent process, to which we now turn. 



8 The coalescent 

The concept of the coalescent is now discussed at length in many text- 
books, and entire books (for example Hein, Schierup and Wiuf (2005) 
and Wakeley (2009)) and book chapters (for example Marjoram and 
Joyce (2009) and Nordborg (2001)) have been written about it. Here we 
can do no more than outline the salient aspects of the process. 

The aim of the coalescent is to describe the common ancestry of the 
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sample of n genes at various times in the past through the concept of 
an equivalence class. To do this we introduce the notation r, indicating 
a time r in the past (so that if ti > T2, time ti is further in the past 
than time T2). The sample of n genes is assumed taken at time t — Q. 

Two genes in the sample of n are in the same equivalence class at 
time r if they have a common ancestor at this time. Equivalence classes 
are denoted by parentheses: Thus if rt = 8 and at time r genes 1 and 2 
have one common ancestor, genes 4 and 5 a second, and genes 6 and 7 a 
third, and none of the three common ancestors are identical and none is 
identical to the ancestor of gene 3 or of gene 8 at time r, the equivalence 
classes at time r are 

{(1,2), (3), (4, 5), (6, 7), (8)}. (8.1) 

We call any such set of equivalence classes an equivalence relation, and 
denote any such equivalence relation by a Greek letter. As two particular 
cases, at time r = the equivalence relation is (/)i = {(1), (2), (3), (4), (5), 
(6), (7), (8)}, and at the time of the most recent common ancestor of all 
eight genes, the equivalence relation is 0„ — {(1,2,3,4,5,6,7,8)}. The 
Kingman coalescent process is a description of the details of the ancestry 
of the n genes moving from (pi to </>„. For example, given the equivalence 
relation in (j8.ip . one possibility for the equivalence relation following a 
coalescence is {(1, 2), (3), (4, 5), (6, 7, 8)}. Such an amalgamation is called 
a coalescence, and the process of successive such amalgamations is called 
the coalescence process. 

Coalescences are assumed to take place according to a Poisson process, 
but with a rate depending on the number of equivalence classes present. 
Suppose that there are j equivalence classes at time r. It is assumed 
that no coalescence takes places between time r and time t + 5t with 
probability 1 — hi{j ^ 1)5t. (Here and throughout we ignore terms of 
order (5t)^ .) The probability that the process moves from one nominated 
equivalence class (at time r) to some nominated equivalence class which 
can be derived from it is 5t. In other words, a coalescence takes place in 
this time interval with probability ij(j — 1)(5t, and all of the i{j — l)/2 
amalgamations possible at time r are equally likely to occur. 

In order for this process to describe the 'random sampling' evolution- 
ary model described above, it is necessary to scale time so that unit time 
corresponds to 2N generations. With this scaling, the time Tj between 
the formation of an equivalence relation with j equivalence classes to 
one with j — 1 equivalence classes has an exponential distribution with 
mean 2/ i{j — 1). 
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The (random) time Tmrcas = Tn + T„_i + r„_2 + • • • + T2 mitil all 
genes in the sample first had just one common ancestor has mean 

J— 2 

(The suffix 'MRCAS' stands for 'most recent common ancestor of the 
sample.) This is, of course close to 2 coalescent time units, or 4N genera- 
tions, when n is large. Tavare (2004) has found the (complicated) distri- 
bution of Tmrcas- Kingman (1982a, b,c) showed that for large popula- 
tions, many population models (including the 'random sampling' model) 
are well approximated in their sampling attributes by the coalescent pro- 
cess. The larger the population the more accurate is this approximation. 
We now introduce mutation into the coalescent. Suppose that the 
probability that any particular ancestral gene mutates in the time inter- 
val (r + St, t) is |^r. All mutants are assumed to be of new allelic types 
(the infinitely many alleles assumption). If at time r in the coalescent 
there are j equivalence classes, the probability that either a mutation or 
a coalescent event had occurred in (r -I- 6t, t) is 

J^ST + ^-^^^5T = -j{j + e-l)6T. (8.3) 

We call such an occurrence a defining event, and given that a defining 
event did occur, the probability that it was a mutation is 9/{j + 9 — 1) 
and that it is a coalescence is {j — l)/{j + 6 — 1). 

The probability that k different allelic types are seen in the sample 
is then the probability that k of these defining events were mutations. 
The above reasoning shows that this probability must be proportional 
to 9^/Sn{9), where Sn{9) is defined below (12. 2L the constant of propor- 
tionality being independent of 9. This argument leads to (12. 3p . 

Using these results and combinatorial arguments counting all possible 
coalescent paths from a partition (oi, 02, . . . , a„) back to the original 
common ancestor, Kingman (1982a) was able to derive the more de- 
tailed sample partition probability distribution (J2.2L and deriving this 
distribution from coalescent arguments is perhaps the most pleasing way 
of arriving at it. For further comments along these lines, see (Kingman 
(2000)). 

The description of the coalescent given above follows the original de- 
rivation given by Kingman (1982a). The coalescent is perhaps more 
naturally understood as a random binary tree. These have now been 
investigated in great detail: see for example Aldous and Pitman (1999). 
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Many genetic results can be obtained quite simply by using the coales- 
cent ideas. For example, Watterson and Donnelly (1992) used Kingman's 
coalescent to discuss the question "Do Eve's Alleles Live On?" To an- 
swer this question we assume the infinitely-many-neutral-alleles model 
for the population and consider a random sample of n genes taken at 
time 'now'. Looking back in time, the ancestral lines of those genes co- 
alesce to the MRCAS, which may be called the sample's 'Eve'. Of course 
if Eve's allelic type survives into the sample it would be the oldest, but it 
may not have survived because of intervening mutation. If we denote by 
Xn the number of representative genes of the oldest allele, and by Yn the 
number of genes having Eve's allele, then Kelly's result (J7.6I) gives the 
distribution of A„. We denote that distribution here by Pn(j), j — 0, 1, 
2, . . . , n, and the distribution of Yn by qn{j), j = 0, 1, 2, . . . , n. Unlike 
the simple explicit expression for Pn{j), the corresponding expression for 
Qnij) is very complicated: see (2.14) and (2.15) in Watterson and Don- 
nelly (1992), derived using some of Kingman's (1982a) results. Using the 
relative probabilities of a mutation or a coalescence at a defining event 
gives rise to a recurrence equation for <7n(j), j = 0, 1, 2, . . . , n as 

[nin-l)+je]qnij) 

= n{j - l)g„-i(j - 1) + nin -j- l)qn-lij) + (j + l)Oqn{j + 1) 

(8.4) 

for j = 0,l,2,...,n, (provided that we interpret qn{j) as zero outside 
this range), and for n = 2, 3, . . . . The boundary conditions qi{j) = 1 
for j = 1 , qi{j) = for j > 1, and 

" fc- 1 
Qnin) ^ Pn{n) ^ 11 -——— 

k=2 

apply, the latter because if Xn — n then all sample genes descend from 
a gene having the oldest allele, and 'she' must be Eve. The recurrence 
(|8.4p is a special case of one found by Griffiths (1989) in his equation 
(3.7). 

The expected number of genes of Eve's allelic type was given by Grif- 
fiths (1986), (see also Beder (1988)), as 

m.) = T.-^1n{j) = nl[ f'_ I (8.5) 

Watterson and Donnelly (1992) gave some numerical examples, some 
asymptotic results, and some bounds for the distribution qnij), j — 0, 
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1, 2, . . . , n. One result of interest is that <7„(0), the probabihty of Eve's 
aUele being extinct in the sample, increases with n, to (?oo(0) say. One 
reason for this is that a larger sample may well have its 'Eve' further back 
in the past than a smaller sample. We might interpret Qoo (0) as being the 
probability that an infinitely large population has lost its 'Eve's' allele. 
Note that the bounds 

(2 + ^)(l + ^) <^-W^^^^' (8-6) 

for < 6 < oo, indicate that for all 9 in this range, 9oo(0) is neither 
nor 1. Thus, in contrast to the situation in branching processes, there 
are no sub-critical or super-critical phenomena here. 



9 Other matters 

There are many other topics that we could mention in addition to those 
described above. On the mathematical side, the Kingman distribution 
has a close connection to prime factorization of large integers. On the 
genetical side, we have not mentioned the 'infinitely many sites' model, 
now frequently used by geneticists, in which the DNA structure of the 
gene plays a central role. It is a tribute to Kingman that his work opened 
up more topics than can be discussed here. 
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